home *** CD-ROM | disk | FTP | other *** search
- STATS 3 MENU
-
- REGRESSION
- For the tests that follow, all except LOGIT regression have
- similar input and output structures. You will be asked for the
- variables that are the independent variables and for the one
- dependent variable. You will then be asked for the variable
- (column) into which the calculated values should be placed. The
- program does not place the residuals in variable (column) a, as
- this would restrict the number of variables which could actually
- be used in the regression. To get the residuals, simply subtract
- the calculated data from the actual in the data editor. The
- differences lie in additional parts of the regressions.
-
- -Multiple regression is a traditional regression.
-
- -Ridge regression will require the entry of a ridge factor, which
- should be small and between 0 and 1 (most often below .2). -
-
- -Stepwise regression is like multiple regression, except that you
- specify all independent variables to be considered. The program
- decides on which of these to actually use in the regression. -
-
- -Cochran refers to a regression done using the Cochran-Orcutt
- procedure. A "Cochran" factor of between 0 and 1 must be used.
- This type of regression actually uses a part of the previous point
- in the calculation. If the Cochran factor is 1, then the regression
- is actually calculated upon the first differences of the
- variables.
-
- -Huber regression is used to reduce the weight given to outliers
- in the data. You will need to specify two additional pieces of
- data. The first is the variable into which the program places the
- weights, and the second is the value of the residual at which the
- weights should start to be changed. This procedure can only be
- used after first doing a traditional regression.
-
- -Weighted regression requires you to specify a weight variable
- before execution.
-
- -Chow regression is a simple modification of multiple regression.
- It is used to see if the regression parameters are constant over
- the scope of the data variables. You will have to specify the
- number of points to keep in the first sample.
-
- -LOGIT regression is used when the dependent variable is to be
- constrained to a value above 0 but below 1. LOGIT setup converts
- unsummarized data to the form required by the regression program.
- (Save original data first!)
-
- -PROBIT regression is similar to LOGIT regression. The difference
- is the type of curve that is fit to the data. The logit fits a
- logistic curve to the data while the probit fits a normal
- distribution to the data. Except at the extremes (close to zero or
- 1) the difference between the results is very slight. PROBIT setup
- converts unsummarized data to the form required by the regression
- program. Traditionally, in the probit transform. 5 was added to
- the normal deviate to avoid negative numbers. I have dispensed
- with that addition to simplify the result. I think that in the
- 1990s we all are comfortable with negatives. As a result the
- constant from B/STAT will be 5 lower than from traditional
- packages.
-
- -Non Linear regression refers to a regression where the form is
- not linear in the parameters. In such a case the usual mathematical
- procedures do not work. In this case you will be asked for the
- dependent variables, a variable containing standard errors of the
- measured points and a variable to place the results in. You will
- not be asked for the independent variables. Instead you will be
- asked to enter the equation. This equation is of the form Y=f(X)
- except that you will use the column letters ("a" "b" etc) for the
- independent variables. Each parameter that you wish to estimate
- will have the form "PARM1" "PARM2" etc.
- If we wanted to estimate "a" and "b" in the following formula
-
- Y=a(1-EXP(-bX))
-
- we would enter
-
- PARM1*(1-EXP(-1*PARM2*a))
-
- if the X variable was in column "a" of the spreadsheet.
-
- -Principle Components is not actually a regression method at all.
- It is a process used to reduce the number of variables needed to
- explain the variation in the data. The resultant variables are
- orthogonal; that is the correlation between any two variables is
- 0. Regression can often then be carried out against these pseudo-
- variables. The process is destructive, in that it wipes out the
- existing variables. Each new one is a linear combination of the
- others.
-
- -Correlation matrix shows the correlation between a group of
- variables, rather than doing a full regression. This is often done
- to look at the effects of multi-collinearity on the data.
-
- TIME SERIES
- These are methods of smoothing or projecting data. They are often
- used in combination with other procedures.
-
- -Moving average requires you to choose the variable and the period
- of the moving average. As well, you must select a variable into
- which the averaged variable will be placed.
-
- -Geometric moving average requires the same input as linear moving
- average.
-
- -Fourier smoothing requires a variable to smooth and a variable to
- place the result. It also asks for the number of terms to be kept
- in the intermediate calculations. This value should be less than
- 50, usually less than 15. There must be no missing data for this
- procedure to work. Note that this can be a slow process.
-
- -Linear smoothing requires a variable to smoth and a variable to
- place the result. A linear regression is made assuming that the
- independent variable is a simple counter from 1 to the number of
- rows used. The equation is
-
- Y=a+b.t
-
- -Polynomial smoothing fits a power series to the data. In
- addition to the variable to smooth and the result variable you
- must input the degree of the polynomial. A power of 1 is a linear
- regression. A power of 2 fits the curve
-
- Y=a+b.t+c.t.t
-
- A power of 3 fits
-
- Y=a+b.t+c.t.t+d.t.t.t
-
- etc
-
- -Exponential Form fits an equation such that
-
- Y=EXP(a+b.t)
-
- This is called exponential form to distinguish from exponential
- smoothing which is a totally different process.
-
- -S-Shape smoothing fits the following curve
-
- Y=EXP(a+b/t)
-
- Such a curve will rise and then approach EXP(a) if "b" is
- negative. If "b" is positive then the curve will drop to approach
- EXP(a)
-
- -Brown 1-way exponential smoothing is simple exponential smoothing.
- You will be asked to specify the variable to smooth, and a
- variable in which to store the result. In addition, you will need
- a smoothing constant (0 to 1) and a starting value. If you do not
- specify the starting value, the program will generate one. This
- process is not designed for data with a distinct trend line. If
- there is a distinct linear trend, then 2-way exponential smoothing
- should be used.
-
- -Brown's 2-way exponential smoothing uses linear regression to
- estimate a starting value and trend. You must estimate the
- smoothing coefficient and variable to smooth, and variable for
- result.
-
- -Holt's 2-way exponential smoothing is similar to Brown's, except
- that a separate smoothing coefficient is used for the trend
- factor. Also you may enter initial values for the level and
- trend.
-
- -Multiplicative exponential smoothing is almost identical to
- Holt's. The difference is that the trend factor is taken as a
- proportionat e increase in value rather than a constant to add.
- Thus .02 does not mean that the trend is initially an increment
- of .02 but rather a percentage increase of 2%.
-
- -Winter's exponential smoothing is used if there is a seasonal
- aspect to the data (like retail sales which have a December peak).
- You will have to enter 4 quantities. The first is the smoothing
- coefficient for level. The second is for trend. The third is for
- seasonality. The fourth value is the period of seasonality. Note
- that this method should not be used with data fluctuating above
- and below zero. With data that go below zero, add a constant to
- the data to eliminate negative values. Then, after smoothing,
- subtract the constant.
-
- Interpolation
- B/STAT uses 4 forms of estimating unavailable data.
-
- -Simple linear interpolation requires that you simply select the
- variable.
-
- -Geometric interpolation. Basically the same as linear
- interpolation except that the assumption is that the points are
- connected by a multiplicitive relationship rather than additive.
-
- -Lagrangian interpolation requires two variables: an "X" variable
- and a "Y" variable. There can be no missing "X" variables. This
- can be slow with a large data set, since each point is used in
- estimating missing data.
-
- -Cubic splines assumes that the data set in the selected variable
- consists of evenly-spaced observations.
-
- EXTRACT
- These selections allow you to reduce the size of the data set. The
- first option sums the data. For example, if you want to get yearly
- totals from a data set of monthly data, you can extract summed data
- and reduce the data by a factor of 12. Each element would then be
- a yearly total. In the non-summed case, only every 12th value would
- be left. No summing would be done. This is useful if you want to
- look at subsets in isolation.
-
- MISCELLANEOUS
- This menu has three procedures, in addition to the usual help
- selection.
- -Crosstabs is used to summarize data which contained in two or
- three variables. It produces a count for the combination of values
- in the chosen variables. For example, you may have data on the
- height and weight of a group of army recruits. You could use
- crosstabs to find out the number in each height and weight
- classification, where these could be height in 2-inch increments
- and weight in 5-pound increments. It is most commonly used in
- market research for crosses, such as between age 30 and 34 and
- earning between 20,000 and 30,000 dollars per year.
-
- You first select the variables to use in the crosstab. If you
- select two, then a 2-way crosstab is done. If three, then a 3-way
- crosstab is done. Next, you select the break points for the
- classes in each variable. There may be up to 14 breakpoints,
- giving a maximum of 15 classes for each variable. You need only
- type in as many breakpoints as there are in the a specific
- variable, and leave the rest blank. The number of break points can
- be different for each variable. Note that the lower class includes
- the break point value. Thus, a breakpoint of 200 pounds would put
- 200-pound people in the lower class and 200.01 pound people in the
- higher class. The program will print out the results. If you want,
- you may replace the data in memory with the summarized totals.
- This can be quite useful if you then want to perform a Chi square
- test, type 2, on the result to see if there are any significant
- relationships.
- One factor crosstabs are available. If you choose only one variable
- then the program will generate a new data matrix composed of 2
- variables only. There will be one entry for each unique value in
- the chosen variable. The second variable will be the number of
- occurrences of that value in the original variable. This is a
- destructive process which erases all original data.
-
- -Difference is a rather simple process. The difference of a
- variable is simply the amount of its change from one period to the
- next. Sometimes some procedures will work better on the change in
- a variable rather than the variable itself. This is especially
- true in Box Jenkins analysis. You merely supply the variable to
- difference and the variable into which to place the result.
-
- -Box Cox Transforms are used to transform a variable so that the
- values are normally distributed. The Box Cox procedure uses a
- variable called "lambda". You must provide the minimum lambda to
- test as well as the maximum. You also must specify the number of
- steps to use in going from the minimum to the maximum. The
- program will select the best value of lambda from the ones that
- it tests. The variable to test must have all values greater than
- zero. You also specify a variable into which teh result will be
- placed.
-
-
-